Fix FLOPs inconsistency and add using_sparse_model flag#21743
Fix FLOPs inconsistency and add using_sparse_model flag#21743Devesh-Maheshwari wants to merge 10 commits into
Conversation
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #21743 +/- ##
=======================================
Coverage 87% 87%
=======================================
Files 270 270
Lines 23973 23978 +5
=======================================
+ Hits 20748 20753 +5
Misses 3225 3225 |
deependujha
left a comment
There was a problem hiding this comment.
Hi @Devesh-Maheshwari, thanks for the great work. Added couple of reviews, please have a look. Thanks :)
Co-authored-by: Deependu <deependujha21@gmail.com>
deependujha
left a comment
There was a problem hiding this comment.
thanks for the great work. :)
deependujha
left a comment
There was a problem hiding this comment.
please update changelog file too.
|
Thanks @Devesh-Maheshwari, LGTM. :) |
There was a problem hiding this comment.
Pull request overview
This PR addresses inconsistent GPU peak FLOPs values used for MFU calculations (notably for Hopper GPUs) and introduces an explicit API to opt into sparsity-accelerated peak FLOPs when computing MFU in Fabric’s throughput utilities.
Changes:
- Standardizes Hopper entries in
_CUDA_FLOPSto use dense peak FLOPs (and normalizes FLOPs constants to consistent scientific notation). - Adds
using_sparse_model: Optional[bool]andsparse_cuda_acceleration_factortoThroughput.__init__, with optional scaling ofavailable_flopswhen sparsity is enabled. - Adds tests for sparse scaling/warnings/MFU behavior and updates the Fabric changelog.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/lightning/fabric/utilities/throughput.py |
Adds sparsity-aware FLOPs scaling/warning to Throughput and corrects/normalizes _CUDA_FLOPS entries (especially Hopper). |
tests/tests_fabric/utilities/test_throughput.py |
Adds tests covering sparse scaling, warning behavior, MFU impact, and ThroughputMonitor kwarg forwarding. |
src/lightning/fabric/CHANGELOG.md |
Documents the new Throughput parameters and _CUDA_FLOPS changes (needs wording adjustment). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What does this PR do?
Fixes #21677.
The FLOPs values in
_CUDA_FLOPSfor Hopper GPUs were inconsistent:h100 sxmandh100 pcieused dense per-GPU peaks, whileh100 nvlandh200 sxm1used the headline "with-sparsity" values from NVIDIA's datasheet. This caused a 2× difference in MFU when switching between GPU types.Changes
Data fix (standardize Hopper on dense):
h100 nvl— replaced sparse values (which were duplicates ofh100 sxm's sparse numbers) with the correct H100 NVL dense per-GPU peaks: 30 / 60 / 417.5 / 835.5 / 835.5 / 1670.5 TFLOPS.h200 sxm1— tensor-core entries now stored as dense halves (494.5 / 989.5 / 989.5 / 1979 TFLOPS) instead of sparse headline values.h100 sxm— rounded entries to exact datasheet ÷ 2 (sub-1% drift only).API addition (per agreed design in issue thread):
Throughput.__init__now acceptsusing_sparse_model: Optional[bool] = Noneandsparse_cuda_acceleration_factor: float = 2.0.None(default): no scaling, emitsrank_zero_warnso users explicitly choose.True: scalesavailable_flopsby the factor.False: dense default, no warning.ThroughputMonitorvia**kwargs.Tests
Four new tests in
tests/tests_fabric/utilities/test_throughput.py:test_throughput_sparse_model_scaling— constructor scales correctly for True / False / None-peak / invalid factor.test_throughput_sparse_model_warning— warning fires only when unset + peak known; silenced by explicit True/False oravailable_flops=None.test_throughput_sparse_model_mfu— end-to-end: MFU halves when sparse flag enabled with default 2× factor.test_throughput_monitor_sparse_model—ThroughputMonitorforwards both kwargs intoThroughput.All existing tests in the file still pass (37 passing + 5 pre-existing xfails unrelated to this change).